-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(batch): support spill hash agg for the batch query #16771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to test the case that the directory of the FS is full? And how to GC the spilled data 🤔
If the disk is full, the IO should throw an error. When the query is aborted, it will release its disk occupation. |
Yes, And maybe we need to clean the spilled file somewhere when the batch task is dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
HashAggExecutor
told memory is insufficient,AggSpillManager
will start to partition the hash table and spill to disk. After spilling the hash table,AggSpillManager
will consume all chunks from the input executor, partition and spill to disk with the same hash function as the hash table spilling. Finally, we would get e.g. 20 partitions. Each partition should contain a portion of the original hash table and input data. A subHashAggExecutor
would be used to consume each partition one by one. If memory is still not enough in the subHashAggExecutor
, it will partition its hash table and input recursively.SpillOp
is used to manage the spill directory of the spilling executor and it will drop the directory with a RAII style.RW_BATCH_SPILL_DIR
would be used to configure the path to spill, by default/tmp/
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.